-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[LV] An attempt to cherry-pick the fix PR #132691 (cherry-pick from the main branch to the release/20.x branch) #135231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…lvm#130048) From what I understand, we only create VPReductionRecipes for in-loop reductions, and we don't currently support in-loop AnyOf reductions. We only create VPReductionRecipes in the !PhiR->isInLoop() section of adjustRecipesForReductions, and this comment from the initial patch seems to confirm this https://reviews.llvm.org/D108136#anchor-inline-1038338, so I think we can remove this check in the condition logic. I checked compiling SPEC 2017 with -prefer-inloop-predicates and the added assertion doesn't trigger.
) Currently fast() won't return true if all flags are set via setXXX, which is surprising. Update setters to set all bits if needed to make sure isFast() consistently returns the expected result. PR: llvm#131321
This is split off from llvm#131300. A VPReductionRecipe will never have a AnyOf or FindLastIV recurrence, so when it calls createReduction it always calls createSimpleReduction. If we replace the call then it leaves createReduction with one user in VPInstruction::ComputeReductionResult, which we can inline and then remove.
This patch change the parent of the VPReductionRecipe from VPSingleDefRecipe to VPRecipeWithIRFlags and also print/get/drop/control flags by the VPRecipeWithIRFlags. This will remove the dependency of the underlying instruction. This patch also add a new function `setFastMathFlags()` to the VPRecipeWithIRFlags because the entire reduction chain may contains multiple instructions. And the underlying instruction may not contains the corresponding flags for this reduction. Split from llvm#113903.
…pUtils. NFC (llvm#132014) Split off from llvm#131300, this splits up RecurrenceDescriptor arguments so that arbitrary recurrence kinds may be used down the line.
This doesn't rely on State.CFG.
createInductionAdditionalBypassValues is only used for epilogue vectorization now. Move it out of ILV, which means we do not have to thread through ExpandedSCEVs and also don't have to track the bypass values in ILV. Instead, directly create them if needed after executing the epilogue plan. This moves more the epilogue specific logic out of the generic executePlan.
Instead of executing the whole entry VPIRBB twice, first only execute the VPExpandSCEVRecipes and replace their uses with the expanded VPValue, which will be a live-in. This allows removing special logic in VPExpandSCEVRecipe to support executing twice and allows moving the ExpandedSCEVs map out of VPTransformState. It will also allow adding other recipes to the entry VPBB in the future.
Update code to use VPBuilder, to simplify follow-up changes.
…CI (llvm#131300) VPReductionRecipes take a RecurrenceDescriptor, but only use the RecurKind and FastMathFlags in it when executing. This patch makes the recipe more lightweight by stripping it to only take the latter two. The motiviation for this is to simplify an upcoming patch to support in-loop AnyOf reductions. For an in-loop AnyOf reduction we want to create an Or reduction, and by using RecurKind we can create an arbitrary reduction without needing a full RecurrenceDescriptor.
Splits off reduction printing tests, to limit growth and add test case for printing find-last-IV (llvm#132689)
This moves the logic for computing the FindLastIV reduction result to its own opcode. A follow-up patch will update the new opcode to also take the start value, to fix llvm#126836. PR: llvm#132689
llvm#132690) Keep the start value as operand of ComputeFindLastIVResult. A follow-up patch will use this to make sure the start value is frozen if needed. Depends on llvm#132689 PR: llvm#132690
Add missing test coverage for llvm#126836.
This adds missing test coverage for llvm#132691.
FindLastIV introduces multiple uses of the start value, where in the original source there was only a single use, when the epilogue is vectorized. Each use of undef may produce a different result, so introducing multiple uses can produce incorrect results when the input is undef/poison. If the start value may be undef or poison, freeze it and use the frozen value, which will be the same at all uses. See the following scenarios in Alive2: * Both main and epilogue vector loops execute, go to exit block: https://alive2.llvm.org/ce/z/_TSvRr * Both main and epilogue vector loops execute, go to scalar loop: https://alive2.llvm.org/ce/z/CsPj5v * Only epilogue vector loop executes, go to exit block: https://alive2.llvm.org/ce/z/5XqkNV * Only epilogue vector loop executes, go to scalar loop: https://alive2.llvm.org/ce/z/JUpqRN The latter 2 show requiring freezing the resume phi. That means we cannot freeze in the preheader. We could move the freeze to the main iteration count check, but that would be a bit fragile to find and other transforms can sink the freeze if needed. Depends on llvm#132689 and llvm#132690. Fixes llvm#126836 PR: llvm#132691
nikic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaks LoopUtils.h ABI in obvious ways and FMF.h ABI in less obvious ways.
Can this be fixed in a more minimal way than backporting 18 commits that include a lot of refactorings?
|
I'm not sure if it is feasible to strip the fix down, as it depends quite a few refactoring patches. For 20.x, it might be best just not enable epilogue vectorization for FindLastIV: #135666 |
|
Thanks @fhahn for your comment and your patch. It is a good reason for closing down this one. |
This is to backport the fix for #126836 to the
release/20.xbranch. Tested on the real thing.